Bayesian Inference of Visual Motion Boundaries
نویسندگان
چکیده
This chapter addresses an open problem in visual motion analysis, the estimation of image motion in the vicinity of occlusion boundaries. With a Bayesian formulation, local image motion is explained in terms of multiple, competing, nonlinear models, including models for smooth (translational) motion and for motion boundaries. The generative model for motion boundaries explicitly encodes the orientation of the boundary, the velocities on either side, the motion of the occluding edge over time, and the appearance/disappearance of pixels at the boundary. We formulate the posterior probability distribution over the models and model parameters, conditioned on the image sequence. Approximate inference is achieved with a combination of tools: A Bayesian lter provides for online computation; factored sampling allows us to represent multimodal nonGaussian distributions and to propagate beliefs with nonlinear dynamics from one time to the next; and mixture models are used to simplify the computation of joint prediction distributions in the Bayesian lter. To eÆciently represent such a high-dimensional space we also initialize samples using the responses of a lowlevel motion discontinuity detector. The basic formulation and computational model provide a general probabilistic framework for motion estimation with multiple, non-linear, models. 1 Visual Motion Analysis Motion is an intrinsic property of the world and an integral part of our visual experience. It provides a remarkably rich source of information that supports a wide variety of visual tasks. Examples include 3D model acquisition, event detection, object recognition, temporal prediction, and oculomotor control. Visual motion has long been recognized as a key source of information for inferring the 3D structure of surfaces and the relative 3D motion between the observer and the scene (Gibson 1950; Ullman 1979; Longuet-Higgins and Prazdny 1980). In particular, the 2D patterns of image velocity that are produced by an observer moving through the world can be used to infer the observer's 3D movement (i.e., egomotion). Many animals are known to use visual motion to help control locomotion and their interaction with objects (Gibson 1950; Warren 1995; Sun and Frost 1998; Srinivasan, Zhang, Altwein, and Tautz 2000). It is also well-known that visual motion provides information about the 3D structure of the observed scene, including the depth and orientation of surfaces. Given the 3D motion of the observer and the 2D image velocity at each pixel, one can infer a 3D depth map. In particular, it is straightforward to show that the 2D velocities caused by the translational component of an observer's 3D motion are inversely proportional to surface depth (Heeger and Jepson 1992; Longuet-Higgins and Prazdny 1980). While much of the research on visual motion has focused on the estimation of 2D velocity and the inference of egomotion and 3D depth, it is also widely recognized that visual motion conveys information about object identity and behavior. Many objects exhibit characteristic patterns of motion. Examples include rigid and articulated motions, di erent types of biological motion, or trees blowing in the wind. From these di erent classes of motion we e ortlessly detect and recognize objects, assess environmental conditions (e.g., wind, rain, and snow), and begin to infer and predict the object behavior. This facilitates a broad range of tasks such as the detection and avoidance of collisions, chasing (or eeing) other animate objects, and the inference of the activities and intentions of other creatures. Given the signi cance of visual motion, it is not surprising that it has become one of the most active areas of computer vision research. In just over two decades the major foci of research on visual motion analysis include: Optical Flow Estimation: This refers to the estimation of 2D image velocities from image sequences (Horn 1986; Barron, Fleet, and Beauchemin 1994; Otte and Nagel 1994). Although originally viewed as a precursor to the estimation of 3D scene properties, the techniques developed to estimate optical ow have also proven useful for other registration problems; examples are found in medical domains, in video compression, in forming image mosaics (panoramas), and in stop-frame animation. Motion-Based Segmentation: Although optical ow elds are clearly useful, they do not explicitly identify the coherently moving regions in an image. Nor do they separate foreground and background regions. For these tasks, layered motion models and the Expectation-Maximization (EM) algorithm have become popular (Jepson and Black 1993; Sawhney and Ayer 1996; Vasconcelos and Lippman 2001; Weiss and Adelson 1996), as have many other approaches, such as automated clustering based on spatial proximity and motion similarity. Egomotion and Structure-from-Motion: The estimation of self-motion and 3D depth from optical ow or tracked points over many frames, has been one of the longstanding fundamental problems in visual motion analysis (Broida, Chandrashekhar, and Chellappa 1990; Heeger and Jepson 1992; Tomasi and Kanade 1992; LonguetHiggins and Prazdny 1980). While typically limited to nearly stationary (rigid) environments, current methods for estimating egomotion can produce accurate results at close to video frame rates (e.g., see (Chiuso, Favarto, Jin, and Saotto 2000)). Visual Tracking: Improvements in ow estimation have enabled visual tracking of objects reliably over tens and often hundreds of frames. Often these methods are strongly model-based, requiring manual initialization, and prior speci cation of image appearance and model dynamics (Irani, Rousso, and Peleg 1994; Shi and Tomasi 1994; Sidenbladh, Black, and Fleet 2000). One of the lessons learned from research on visual tracking is the importance of having suitable models of image appearance and temporal dynamics, whether learned prior to tracking (Hager and Belhumeur 1998; Black and Jepson 1998), or adaptively during tracking (Jepson, Fleet, and El-Maraghi 2001). In this chapter we focus on the problem of estimating 2D image velocity, especially in the neighborhoods of surface boundaries.
منابع مشابه
Motion transparency: making models of motion perception transparent.
In daily life our visual system is bombarded with motion information. We see cars driving by, flocks of birds flying in the sky, clouds passing behind trees that are dancing in the wind. Vision science has a good understanding of the first stage of visual motion processing, that is, the mechanism underlying the detection of local motions. Currently, research is focused on the processes that occ...
متن کاملLearning Switching Linear Models of Human Motion
The human figure exhibits complex and rich dynamic behavior that is both nonlinear and time-varying. Effective models of human dynamics can be learned from motion capture data using switching linear dynamic system (SLDS) models. We present results for human motion synthesis, classification, and visual tracking using learned SLDS models. Since exact inference in SLDS is intractable, we present t...
متن کاملInference of Markov Chain: AReview on Model Comparison, Bayesian Estimation and Rate of Entropy
This article has no abstract.
متن کاملBayesian Nonparametric and Parametric Inference
This paper reviews Bayesian Nonparametric methods and discusses how parametric predictive densities can be constructed using nonparametric ideas.
متن کاملBayesian approach to inference of population structure
Methods of inferring the population structure, its applications in identifying disease models as well as foresighting the physical and mental situation of human beings have been finding ever-increasing importance. In this article, first, motivation and significance of studying the problem of population structure is explained. In the next section, the applications of inference of p...
متن کامل